Method and device for generating 3d images

ABSTRACT

A method and a device for generating 3D images, according to which an image of a second sequence of images is generated in addition to an image of a first sequence of 2D images at an interval that can be determined via an approximation variable (α). A measure of similarity (d k ) between successive images of the first sequence is determined and compared with threshold values (δ 0 &lt;δ 1 &lt;δ 2 ) so as to modify the approximation variable (α) depending thereon in such a manner that the stereo base width does not turn unnaturally large. A phase analyzer ( 12 ) is used to determine a prevailing direction of movement in successive images of the first sequence of images and a phase converter ( 16 ) is used to allocate the image of the first and second sequence of images to a left-hand or right-hand viewing channel depending on a prevailing direction of movement in successive images of the first sequence.

The invention relates to a method and a device for the generation of 3-D images.

Three-dimensional imaging is often used to analyze objects, particularly in the fields of medicine and science. Various methods with which television pictures in particular can be produced in three dimensions have also been developed for general consumer applications.

Among said methods, there is a basic distinction between sequential image trans-mission, in which the images for the right eye and the left eye are transmitted alternately one after the other or saved to a storage medium, and parallel transmission, in which the images are transmitted on two separate channels.

One particular disadvantage of sequential image transmission in connection with conventional television systems is the fact that the refresh rate is reduced to 25 images per second for each eye. This creates an unpleasant flickering for the viewer. Of course, this limitation does not occur when the image sequences are each transmitted on their own channel (left or right). However, problems may still arise with synchronizing both channels and due to the requirements placed on the receiver, which must be able to receive and process two channels simultaneously. This is not possible for most systems generally available on the market.

Signal transmission and processing will likely be entirely digital in future television systems. In such systems, every image is broken down into individual pixels which are transmitted in digitized format. In order to reduce the bandwidth required for this process, the appropriate compression methods are used; however, these create problems for stereo transmission.

For example, using block coding methods with a reasonable rate of compression, it is generally not possible to reconstruct every individual line of an image precisely. In addition, interframe coding techniques, such as MPEG-2, do not allow one to transmit or save stereo images in a sequential image format, because image information from one image is still contained in another image, creating the so-called “crosstalk effect”, which makes clear separation of the right image from the left impossible.

Other methods for generating a three-dimensional image sequence from a two-dimensional image sequence are disclosed in DE 35 30 610 and EP 0 665697. An autostereoscopic system with an interpolation of images is disclosed in EP 0 520 179, whereas in “Huang: Image Sequence Analysis” (published in Springer Verlag) problems of the recognition of motion areas in image sequences are discussed.

Therefore, the problem behind the invention is to create a method and a device of the type specified in the introduction with which it is possible to generate 3-D images with a very natural three-dimensional image impression even if using the transmission and/or compression methods described in the introduction.

This problem has been solved with a method according to claim 1 and a device according to claim 10.

The dependent claims contain further advantageous embodiments of the invention.

Additional details, features, and advantages of the invention may be seen from the following description of a preferred embodiment with reference to the drawings. They show:

FIG. 1 a schematic block diagram of circuitry according to the invention;

FIG. 2 a graphical representation of an actual image sequence and of a scanned image sequence;

FIG. 3 a-c schematic representations of phase control in sequential images; and

FIG. 4 a schematic block diagram of one imaging application of the invented de-vice.

The basic components of a device according to the invention and their interconnections are schematically represented in FIG. 1. The system comprises a first input E1, through which the two-dimensional images generated by a camera and transmitted across a transmission path are directed to an A/D converter 10 and digitized. The digitized images are then directed to an image storage device 11 and a phase selector 16. The images saved in the image storage device 11 are analyzed by a phase analyzer 12, the input of which is connected to the image storage device 11 and the output of which is connected to the phase selector 16. In addition, a long-term storage device 13 is connected to the image storage device 11 for storing images from this storage device and the output of which is connected to an image generator 15. Furthermore, the image generator 15 is also connected to another output of the image storage device 11 and of a motion analyzer 14, to which images from the image storage device 11 are directed. In addition, the device comprises a second input E2 for manual motion control connected to the image generator 15, as well as a third input E3 for manual phase control attached to the phase selector 16. A right or left stereo image B_(L), B_(R) is attached to two outputs of the phase selector 16, which are connected to a first or second output A1, A2 of the device.

A second image sequence is generated by this device based on a (first) image sequence recorded in two-dimensions. Together with the first image sequence, the second sequence makes it possible to view the originally two-dimensional images in three dimensions when the first and second image sequences are transmitted to the left or right eye. The second image sequence is defined according to the following description based on image information resulting from the motion in the first image sequence. The following definitions apply:

x_(ij) is a digitized image at time t with horizontal resolution I and vertical resolution J. The scan rate is Δt, so that the following formula is derived for an image scanned at time k and saved in the image storage device 11: x ^(k) :=x _(ij)(t−Δtk)

The most recent K images are located in the image storage device 11 with length K. 0≦α≦k is a real number representing the time interval of a given image x^(k), during which a (synthetic) image of the second image sequence is generated (“approximation variable”). In addition, B_(L) represents the given displayed left image and B_(R) the given displayed right image.

It is assumed that a fixed value is given to cc. The images x^(k) in the image storage device 11 are viewed as sample values (scanned image sequence according to curve b in FIG. 2) of a constant function (actual image sequence according to curve a in FIG. 2). Various methods of approximation may be applied to this function. The following explanations relating to FIG. 2 refer to a linear spline approximation. However, other methods of approximation may be used as appropriate; for example, higher-level or polynomial approximations.

FIG. 2 shows a image sequence in two-dimensional (I/J-) space. The second image sequence is calculated by the image generator 15 as follows: First, α_(U) is calculated as the largest whole number which is smaller than α. Next, α_(O) is calculated as the smallest whole number which is larger than α. So: B_(L):=x⁰ B _(R) :=x ^(+o)(α−α_(u))+x ^(αu)(1−α+α_(u)) where the image sequence B_(L) for a left viewing channel (left eye) is calculated by the given actual images of the first image sequence x⁰, x¹, etc., and the (second) image sequence B_(R) is calculated by approximation for a right viewing channel (right eye).

This calculation is performed separately by the image generator 15 for all of the pixels x_(ij) in a selected color space (RGB or YUV); that is: B _(R) :=b _(ij) ^((Y,U,V)):= (x _(ij) ^(αo)(Y)(α−α_(u))+x _(ij) ^(αu)(Y)(1−α+α_(u)), x _(ij) ^(αo)(U)(α−α_(u))+x _(ij) ^(αu)(U)(1α+α_(u)), x _(ij) ^(αo)(V)(α−α_(u))+x _(ij) ^(αu)(V)(1−α+α_(u))).

In addition, automatic phase control is performed by the phase analyzer 12 to determine movements in sequential images of the first image sequence. It is assumed that j_(m):=J/2 is the horizontal midpoint of an image, so x_(ijm) ⁰ with 0≦i≦I is the middle column of the image x⁰ at time t. Furthermore, M<j_(m) is a selected whole number. Then: x _(ij) ^(0s) :=x _(ij) ⁰ with 0<i<I and j_(m) −M<j<j _(m) +M will be defined as a scanned image, shown in vertical stripes in FIG. 3 a. Said image comprises 2M+1 columns s around the horizontal midpoint j_(m) of the image x⁰. Now. N is a fixed number with N>M, so: x _(ij) ^(1s) with 0≦i≦I and j _(m) −N≦j≦j _(m) +N are defined as the search region (see FIG. 3 b) in image x¹, in which the partial image with the greatest similarity to the scanned image x_(ij) ^(0s) is sought.

d₁ is the similarity of the scanned image to a partial image of equal size from the search region with a displacement position 1, where −N≦1≦+N.

If cross-correlation is chosen as a measure of similarity, d₁ is the result for the displacement position 1: Formula (1): $d_{1}:={1{\underset{i = 0}{\overset{I}{- \sum}}\quad{\sum\limits_{j = {j_{m} - M}}^{j_{m} + M}\quad\frac{{x_{ij}^{0} \cdot x_{{ij} - 1}^{1}}}{\sqrt{\left( x_{ij}^{0} \right)^{2} \cdot \left( x_{{ij} - 1}^{1} \right)^{2}}}}}}$

Here, the value of I ranges from −N to +N, where 1 represents the given displacement position of a partial image in the search region.

As an alternative to cross-correlation, a Euclidean distance or an absolute amount may also be chosen as a measure of similarity.

Thus, with this method, as indicated in FIGS. 3 a and b, the scanned image x^(s) (FIG. 3 a) runs like a scanner across the search region (FIG. 3 b) of the image x¹ (previous image) and looks for the region with the greatest similarity d₁ to the scanned image, where the similarity d₁ is calculated for every displacement position 1.

In addition, a whole number is defined, which maybe called the moment of inertia and with which blurring is defined according to FIG. 3 c. This is used to allow for camera movement which should not be considered displacement of the image. For the value of ε, −1≦F≧1 approximately.

This analysis is performed substantially as follows. First, all measures of similarity d₁ for −N≦I≦+N are calculated by the phase analyzer 12. Next, the measure of similarity d_(min) is chosen with the smallest value (d_(min):=min d₁) and the index I_(min) of this measure of similarity is determined. The values I_(min) and ε are compared by the phase selector 16, and the phase selector 16 switches as a function of the results of the comparison as follows:

If I_(min)<ε, this means that the region of greatest similarity in the search region is displaced to the left, and thus the predominant direction of movement in sequential images x¹, x⁰ of the first image sequence is indicated from left to right. This may result from the movement of an object in the images from left to right or from the panning of the camera from right to left. In this case, for the left image B_(L):=x⁰ (i.e., the given image of the image sequence) and a calculated synthetic image (second image sequence) is selected for the right image B_(R). In addition, a “shift” indicator is set to “left” in the phase selector 16. If I_(min)>ε, this means that the region of greatest similarity in the search region is displaced to the right, and thus the predominant direction of movement in sequential images x¹, x⁰ of the first image sequence is indicated from right to left. This may result from the movement of an object in the images from right to left or from the panning of the camera from left to right. In this case, a calculated synthetic image (second image sequence) is selected for the left image B_(L) and for the right image B_(R):=x⁰ (i.e., the given image of the image sequence). In addition, the “shift” indicator is set to “right”.

If |I_(min)|<ε and the indicator is set to “right”, then a calculated synthetic image is selected for the left image B_(L) (second image sequence) and for the right image B_(R):=x⁰ (i.e., the given image of the image sequence).

Finally, if |I_(min)|<ε and the indicator is set to “left”, then for the left image B_(L):=x⁰ and a calculated synthetic image is selected for the right image (second image sequence).

The next image is then accessed and the same process is repeated for this image, beginning with the calculation of the minimum value of the measure of similarity d_(min).

This automatic phase control or selection may also be switched off and, for example, replaced by manual switching using a keyboard via the device's third input.

Furthermore, the embodiment shown in FIG. 1 comprises the motion analyzer 14, which uses dynamic motion control or motion calculation to prevent the stereo base from becoming too large when there are large movements. In addition, this ensures that a certain minimum width of the stereo base is maintained during very slow movements before it disappears in images without any motion. The long-term storage device 13, from which images are accessed and used as images of the second image sequence when the movements are too slow, has been provided for this last purpose.

The measure of similarity d_(k) at time t_(k) is defined as follows: Formula (2): $d_{k}:={1 - {\sum\limits_{i = 0}^{I}\quad{\sum\limits_{j = 0}^{J}\quad\frac{{x_{ij}^{k} \cdot x_{ij}^{k + 1}}}{\sqrt{\left( x_{ij}^{k} \right)^{2} \cdot \left( x_{ij}^{k + 1} \right)^{2}}}}}}$

Therefore, this measure of similarity is a function of the extent to which the entire contents of the next image in an image sequence differ from the contents of the previous image, and thus represents a measure of the speed of motion in the images.

Threshold values δ₀<δ₁<δ₂ are defined for the analysis of said measure of similarity, where in the ideal case the measure of similarity d_(k)=0 for an unchanged (constant) image at time t_(k) in comparison to the previous image at time t_(k+1). However, because there is always a certain amount of background noise during digitization, it should be assumed that d_(k)<δ₀ for an unchanged image.

A Euclidian distance or an absolute amount may of course be chosen for the calculation instead of the cross-correlation described. The individual color values of the selected color space RGB or YUV must always be processed separately.

To analyze the value of the measure of similarity d_(k)(k=0, 1, . . . K), it is first stored in the motion analyzer 14 and then compared to the threshold values.

If d_(k)<δ₀, this means that the movements in the sequential images are very slow or nil. In this case, the transfer of the values of x^(k) to the long-term storage device 13 is stopped so that images will be available which have a sufficient motion differential. In addition, images stored in the long-term memory device are used to generate the second image sequence in order to maintain the minimum stereo base width.

If d_(k)>δ0, the value of the approximation variables a will change as a function of the size of the measure of similarity d_(k) relative to the threshold values δ₀, δ₁, δ₂, as follows.

If δ₀<d_(k)<δ₂ and d_(k)−d_(k−)1≦−δ₁ and as long as α<k−1, then the approximation variable is set at α:=α+s.

If δ₀<d_(k)<δ₂ and d_(k)−d_(k)1>δ₁ and as long as α≧2 ist, then the approximation variable is set at α:=α−s.

The character s denotes a step width which is preferably 0.1, however, it can have other values as well.

If δ₀<d_(k)<δ₂ and −δ₁<d_(k)−d_(k−1)<δ₁, then the approximation variable will remain at α:=α because the motion velocity is substantially constant. In this case, no adjustment is necessary.

Finally, if δ₂<d_(k), this means that the movement is very fast and the stereo base width would be too large. In this case, the approximation variable is set at α:=¹/d_(k).

This dynamic motion control can also be switched off like the automatic phase control and replaced by manual entry; for example, using a keyboard via the device's second input.

The method described will preferably be implemented using a data processing program on a computer, in particular a digital image processing system for the generation of a three-dimensional depiction of television pictures transmitted or stored in a two-dimensional format.

In the following, a preferred example with specific values shall be given for the above embodiment. In case of application of the known PAL standard the horizontal resolution is I=576 and the vertical resolution is J=768, whereas for the NTSC standard, =480 and J=640 are prescribed.

Generally it is sufficient to store the last five images in the image storage device 11 which means K:≦5. As an initial value α₀, the approximation variable is set to α₀:=2.1. For an adequate analysis of motion in sequential images the value of M is set to 1 or 2. The value of N should be chosen such that even in case of fast motions the scanning image is still within the search region. For this, a value of N of 20≦N≦30 (especially N:=25) is adequate. However, the value of N can as well comprise the complete original image so that N:=J/2.

For defining the blurring, a value of ε:=1 is proposed whereas for evaluating the measure of similarity the following values for the threshold values are preferably chosen: δ₀:=0.05, δ₁:=0.6 and δ₂:=0.8.

With an embodiment realized with these values a very natural three-dimensional reproduction could be obtained for image sequences with very differently moving contents.

Finally, FIG. 4 shows a block diagram of a device (stereo decoder or stereo viewer) for the generation and depiction of 3-D images which are calculated based on a sequence of 2-D images transmitted over a transmission path or accessed from a storage medium.

The device comprises a first input 21, to which the 2-D images transmitted across a transmission path and demodulated or decompressed according to known techniques are connected. In addition, there is a second input 22, which is connected to a DVD player, a video recorder, or another source of images, for example.

Both of these inputs are connected to the invented device 23 according to FIG. 1, with which 3-D images are calculated based on the sequence of 2-D images according to the detailed explanation above. The outputs A1, A2 of this device, to which a sequence of left or right images B_(L), B_(R) is connected, are connected to a stereo storage device 24, 25, in which the images are stored for each channel.

Finally, different driver levels can be selected via a third input 26 by activating a selector switch 27, by means of which a corresponding image generator is controlled.

For example, a driver 28 for simulator goggles 29, a driver 30 for an autostereoscopic monitor 31, and a driver 32 for a stereo projector 33 are shown here.

This device is preferably designed as a component of a digital image processing system for the generation of a three-dimensional depiction of television pictures transmitted or stored in two dimensions. 

1-14. (canceled)
 15. A method for generating 3-D images from a first sequence of 2-D images, comprising the following steps: (a) comparing a pair of sequential images in the first image sequence to determine a measure of similarity (d_(k)) between the pair of sequential images; (b) comparing the measure of similarity (d_(k)) to predetermined threshold values (δ₀<δ₁<δ₂); (c) if δ₁<δ_(k)<δ₂, (i) setting an approximation variable (a) within a predetermined range to maintain a minimum stereo base width and to prevent the stereo base from becoming too large, and (ii) generating a synthetic image for a second image sequence from the pair of images of the first image sequence by interpolating between the pair of images of the first image sequence based on the approximation variable; (d) if d_(k)<δ₀ setting a temporally previous image of the first image sequence as an image of the second image sequence; and (e) assigning images of the first and second image sequence to a left and right viewing channel, respectively.
 16. A method as set forth in claim 15, wherein if δ₁<δ_(k)<δ₂ and d_(k)−d_(k−1)≦−δ₁ and as long as α≦k−1, the approximation variable is set to α:=α+s, and if δ₁<d_(k)<δ₂ and d_(k)−d_(k-1)≧δ₁ and as long as α>2, the approximation variable is set to α:=α−s.
 17. A method as set forth in claim 15, wherein the measure of similarity (d_(k)) is calculated by determining a Euclidean distance or an absolute value.
 18. A method as set forth in claim 15, wherein the images of the second image sequence are calculated by linear spline approximation or a higher-level or polynomial approximation.
 19. A method as set forth in claim 15, wherein in determining the predominant direction of motion, a vertical mid-region of a current image (x⁰) of the first image sequence is compared to different vertical regions of a previous image (x¹) of this sequence and it is determined whether the vertical region of the previous image with the greatest similarity to the mid-region of the current image is situated left or right of center.
 20. A method as set forth in claim 15, wherein in determining the predominant direction of motion, a second measure of similarity (d_(I)) between the image regions is calculated by determination of a Euclidean distance or an absolute value.
 21. A method as set forth in claim 15, wherein a blurring region (ε) with which small movements can be suppressed is established around the mid-region of the current image (x⁰).
 22. A system for generating 3-D images from a first sequence of 2-D images, comprising a first input for receiving the first image sequence; a motion analyzer connected to the first input for comparing a pair of sequential images in the first image sequence to determine a measure of similarity (d_(k)) between the pair of sequential images and for setting an approximation variable (a) that determines a stereo base based on the measure of similarity; an image generator connected to the motion analyzer for generating a synthesized image of a second sequence of images by interpolating between the pair of images of the first image sequence using the approximation variable; first and second outputs for transmitting the first and second image sequences to left and right viewing channels; and a phase selector connected to the image generator for assigning images received from the phase analyzer to one of the left and the right viewing channels.
 23. A system as set forth in claim 16, comprising a phase analyzer interconnected between the image generator and the phase selector for determining a predominant direction of motion in sequential images of the first image sequence, wherein the phase selector assigns images to one of the left and right viewing channels based on the predominant direction of motion determined by the phase analyzer.
 24. A system as set forth in claim 16, comprising an analog-to-digital converter for converting analog image data received at the first input to digital image data.
 25. A system as set forth in claim 18, comprising one or more of (a) a second input connected to the image generator for receiving manual motion control data; and (b) a third input connected to the phase selector for receiving manual phase control data. 